require(ggplot2)
require(GGally)
require(ggmap)
require(qdata)
require(survival)
data(italy)
data(bottlecap)
data(istat)
ggplot2 is a community that is flourishing day by day. There are packages developed starting from ggplot2 grammar, the so-called ggplot2 extensions, like GGally and packages designed to work with ggplot2, like ggmap. All of them contribute to the growth of ggplot2 community. This means that also you can easily create your own stats, geoms and positions, and provide them in other packages, which will become ggplot2 extensions. You can find a list of the official ggplot2 extensions at www.ggplot2-exts.org. These packages are available on CRAN and/or on Github. Enjoy yorself with them!
In this chapter we will analyze the two most important packages linked to ggplot2:
GGally, an extension of ggplot2 for correlation matrix and survival plotsggmap, for Spatial Visualization with ggplot2GGally is a convenient package built upon ggplot2 for correlation matrix and survival plots written and mantained by Barret Schloerke. It reduces the complexity of combining geometric objects with transformed data.
GGally extends ggplot2 by providing several functions including:
ggcor(): for pairwise correlation matrix plotggpairs(): for scatterplot plot matrixggsurv(): for survival plotSupposing the package is already installed, first of all GGally must be loaded.
require(GGally)
The function ggcorr() draws a correlation matrix plot using ggplot2.
Let us see an example considering bottlecap dataset. bottlecap contains measures of the mean diameter of the caps produced by 8 different cavity of a forging machine during the quality control phase. Suppose we want to know if there is correlation between the measures performed by the 8 cavity in order to verify if it is possible to reduce the measuring cavity.
ggcorr(data = bottlecap, palette = "RdBu", label = TRUE)
label argument is set to TRUE in order to add correlation coefficients to the plot.
ggpairs(): for scatterplot plot matrixggpairs() function contains templates for different plots to be combined into a plot matrix. It is a nice alternative to the more limited pairs function of graphics package.
ggpairs(data=istat, # dataframe with variables
columns = 2:4, # columns to be used to make plots
title="Matrix Plot of istat data") # title of the plot
Plots like the one above are very helpful, among others things, in the pre-processing stage of a classification problem, where you want to analyze your predictors given the class labels. It is particularly amazing that we can now use the arguments colour, shape, size and alpha provided by ggplot2:
ggpairs(data=istat, # data.frame with variables
mapping = aes(colour=Area), # esthetic mapping (besides x and y)
columns = 2:4, # columns to be used to make plots
title="Matrix Plot of istat data") # title of the plot
We have some control over which type of plots to use. We can choose which type of graph will be used for continuous vs. continuous (continuous) and discrete vs. discrete (discrete) and continuous vs. discrete (combo). We can also have different plots for the upper diagonal (upper), for the diagonal (diagonal) and for the lower diagonal (lower).
ggpairs(data=istat,
mapping = aes(colour=Area),
columns=2:4,
upper=list(continuous = 'cor', discrete = 'facetbar', combo ='facethist'),
lower=list(continuous = 'smooth', discrete = 'facetbar', combo ='dot'),
diag=list(continuous = 'barDiag', discrete = 'barDiag'),
title="Matrix Plot of istat data")
For example, the code above, set to use in the upper diagonal: correlations for continuous vs. continuous variables, faceted bar plot for discrete vs. discrete variables and faceted histograms for continuous vs. discrete variables.
ggsurv(): for survival plotggsurv() produces Kaplan-Meier plots using ggplot2.
Let us see an example, considering lung data from package survival:
data(lung, package = "survival")
lung data is about survival in patients with advanced lung cancer.
lung <- lung[, c(2,3,5)]
We consider only the following variables:
time: Survival time in daysstatus: censoring status 1 = censored, 2 = deadsex: Male = 1; Female = 2As a first argument ggsurv() needs a survfit object, created by the survival package:
# Fit survival functions
surv <- survfit(Surv(time, status) ~ sex, data = lung)
# Plot survival curves
ggsurv(surv) +
guides(linetype = FALSE) +
scale_colour_discrete(name = 'Sex', breaks = c(1,2),
labels = c('Male', 'Female'))
ggmap is a package for Spatial Visualization with ggplot2 written by David Kahle and Hadley Wickham.
In particular, ggmap enables such visualization by combining the spatial information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps with the layered grammar of graphics implementation of ggplot2.
ggmap plots have the same elements of ggplot2, but certain elements are fixed to map components :
x aesthetic is fixed to longitude,y aesthetic is fixed to latitude,The basic idea driving ggmap is to take a downloaded map image, plot it as a context layer using ggplot2, and then plot additional content layers of data, statistics, or models on top of the map.
In ggmap this process is broken into two pieces:
get_map()ggmap().Let us see an example.
We want to identificate the positions of italian most important cities.
# get italy map
italy_map <- get_map(location="Italy", zoom = 6, maptype = "satellite")
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Italy&zoom=6&size=640x640&scale=2&maptype=satellite&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Italy&sensor=false
location argument is specified as an “address”, it can be specified also by a latitude/longitude pair.
zoom argument specify the level of map zoom; it can be spacified from 3 (continent) to 21 (building) the default is 10 (city). 6 level of zoom returns countries.
maptype specify map theme. Some of the options are: “terrain”, “toner”, “watercolor”, “roadmap”, “satellite”, “hybrid”.
ggmap() function:ggmap(italy_map)
We can add information about the location of most important cities included in italy dataset:
head(italy)
## city lat lon pop region
## 1 Potenza 40.64200 15.798996 69060.0 Basilicata
## 2 Campobasso 41.56300 14.655997 50762.0 Molise
## 3 Aosta 45.73700 7.315003 34062.0 Valle d'Aosta
## 4 Modena 44.65003 10.919995 175034.5 Emilia-Romagna
## 5 Crotone 39.08334 17.123337 59313.5 Calabria
## 6 Vibo Valentia 38.66659 16.100040 32168.0 Calabria
ggmap(italy_map) +
geom_point(aes(colour = region), data = italy)+
geom_text(aes(label = city, colour = region), data = italy, size = 4, check_overlap = T,
hjust = 0, nudge_x = 0.05) +
ggtitle("Map of most the important italian cities")+
labs(colour="Region", label ="Region") +
theme(axis.title=element_blank(),
plot.title=element_text(size = 22, face = "bold"),
legend.title=element_text(size = 16))